Using Salient Words to Perform Categorization of Web Sites

نویسندگان

  • Marek Trabalka
  • Mária Bieliková
چکیده

In this paper we focus on web sites categorization. We compare some quantitative characteristics of existing web directories, analyze the vocabulary used in descriptions of the web sites in Yahoo web directory and propose an approach to automatically categorize web sites. Our approach is based on the novel concept of salient words. Two realizations of the proposed concept are experimentally evaluated. The former uses words typical for just one category, while the latter uses words typical for several categories. Results show that there is a limitation of using single vocabulary based method to properly categorize highly heterogeneous spaces as the World Wide Web.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effects of the Meaningfulness of Salient Brand and Product- Related Text and Graphcis on Web Site Recognition

Building on the associative strength of memory theory and previous studies on the effects of brand name suggestiveness on advertising effectiveness, two salient elements in a business web page, pictures (such as logos or graphics) and words (such as brand or product names), were examined in three experiments. Web sites where salient pictures and words had business meaning suggestive of brand or...

متن کامل

Harnessing the Expertise of 70, 000 Human Editors: Knowledge-Based Feature Generation for Text Categorization

Most existing methods for text categorization employ induction algorithms that use the words appearing in the training documents as features. While they perform well in many categorization tasks, these methods are inherently limited when faced with more complicated tasks where external knowledge is essential. Recently, there have been efforts to augment these basic features with external knowle...

متن کامل

Positioning of Industries in Cyberspace Evaluation of Web Sites Using Correspondence Analysis

  In today’s extremely competitive markets it is crucial for companies to strategically position their brands, products and services relative to their competitors. With the emerging trend in internationalization of companies especially SME’s and the growing use of the Internet with this regard, great amount of attention has been turned to effective involvement of the Internet channel in the mar...

متن کامل

Text Categorization of Commercial Web Pages

In this paper we describe a new on-line document categorization strategy that can be integrated within Web applications. A salient aspect is the use of neural learning in both representation and classification tasks. Within text documents conceived as images, the regions of interest (RoI) containing information meaningful for categorization are identified with the support of a supervised neural...

متن کامل

The Influence of the Meaning of Pictures and Words on Web Page Recognition Performance

Firms spend high sums trying to make their “home” page as memorable as possible to attract repeat visits. For this purpose, fancy pictures and words are used to catch the attention of visitors. Interestingly, the effectiveness of all of this effort is nearly completely unknown. This study investigated how picture and word selections affected the recognition success rates of the sites visited by...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002